首页> 外文OA文献 >A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection
【2h】

A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection

机译:基准数据集和显着性引导的堆叠自动编码器   基于视频的显着对象检测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Image-based salient object detection (SOD) has been extensively studied inthe past decades. However, video-based SOD is much less explored since therelack large-scale video datasets within which salient objects are unambiguouslydefined and annotated. Toward this end, this paper proposes a video-based SODdataset that consists of 200 videos (64 minutes). In constructing the dataset,we manually annotate all objects and regions over 7,650 uniformly sampledkeyframes and collect the eye-tracking data of 23 subjects that free-view allvideos. From the user data, we find salient objects in video can be defined asobjects that consistently pop-out throughout the video, and objects with suchattributes can be unambiguously annotated by combining manually annotatedobject/region masks with eye-tracking data of multiple subjects. To the best ofour knowledge, it is currently the largest dataset for video-based salientobject detection. Based on this dataset, this paper proposes an unsupervised baseline approachfor video-based SOD by using saliency-guided stacked autoencoders. In theproposed approach, multiple spatiotemporal saliency cues are first extracted atpixel, superpixel and object levels. With these saliency cues, stackedautoencoders are unsupervisedly constructed which automatically infer asaliency score for each pixel by progressively encoding the high-dimensionalsaliency cues gathered from the pixel and its spatiotemporal neighbors.Experimental results show that the proposed unsupervised approach outperforms30 state-of-the-art models on the proposed dataset, including 19 image-based &classic (unsupervised or non-deep learning), 6 image-based & deep learning, and5 video-based & unsupervised. Moreover, benchmarking results show that theproposed dataset is very challenging and has the potential to boost thedevelopment of video-based SOD.
机译:在过去的几十年中,对基于图像的显着目标检测(SOD)进行了广泛的研究。但是,基于视频的SOD很少开发,因为其中缺少大型视频数据集,在这些数据集中明确定义和标注了显着对象。为此,本文提出了一个基于视频的SOD数据集,其中包含200个视频(64分钟)。在构建数据集时,我们手动注释了7,650个均匀采样的关键帧上的所有对象和区域,并收集了23个可自由观看所有视频的对象的眼动数据。从用户数据中,我们发现视频中的显着对象可以定义为在整个视频中始终弹出的对象,并且可以通过将手动注释的对象/区域蒙版与多个对象的眼动数据相结合,明确地注释具有此类属性的对象。据我们所知,它目前是用于基于视频的显着物体检测的最大数据集。在此数据集的基础上,本文提出了一种基于显着性指导的堆叠式自动编码器的基于视频的SOD的无监督基线方法。在提出的方法中,首先提取多个时空显着性提示,分别在像素,超像素和对象级别。利用这些显着性提示,可以无监督地构造堆叠式自动编码器,该自动编码器通过对从像素及其时空邻居收集的高维显着性提示进行渐进编码来自动推断每个像素的显着性得分。建议数据集上的模型,包括19种基于图像的经典(无监督或非深度学习),6种基于图像的和深度学习以及5种基于视频的和无监督。此外,基准测试结果表明,提出的数据集非常具有挑战性,并且有可能促进基于视频的SOD的发展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号